An Integrated Approach for Relation Extraction from Wikipedia Texts

نویسندگان

  • Yulan Yan
  • Yutaka Matsuo
  • Mitsuru Ishizuka
چکیده

Linguistic-based methods and web mining-based methods are two types of leading methods for semantic relation extraction task. By integrating linguistic analysis with frequent Web information, this paper presents an unsupervised relation extraction approach, for discovering and enhancing relations in which a specified concept participates. We focus on concepts described in Wikipedia articles. By making use of the characteristics of Wikipedia and Web corpus, we define a novel distance function and develop a linear clustering algorithm on the combination of two kinds of patterns: dependency patterns from dependency analysis of texts in Wikipedia, surface patterns generated from high redundant information from the Web corpus. The experiments on two different domains demonstrate the superiority of our approach comparing with previous method. In essence, our approach shows how deep linguistic features contribute complementally with Web surface features to generate a broad variety of relations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web

This paper presents an unsupervised relation extraction method for discovering and enhancing relations in which a specified concept in Wikipedia participates. Using respective characteristics of Wikipedia articles and Web corpus, we develop a clustering approach based on combinations of patterns: dependency patterns from dependency analysis of texts in Wikipedia, and surface patterns generated ...

متن کامل

Wikipedia Link Structure and Text Mining for Semantic Relation Extraction

Wikipedia, a collaborative Wiki-based encyclopedia, has become a huge phenomenon among Internet users. It covers huge number of concepts of various fields such as Arts, Geography, History, Science, Sports and Games. Since it is becoming a database storing all human knowledge, Wikipedia mining is a promising approach that bridges the Semantic Web and the Social Web (a. k. a. Web 2.0). In fact, i...

متن کامل

Multi-view Bootstrapping for Relation Extraction by Exploring Web Features and Linguistic Features

Binary semantic relation extraction from Wikipedia is particularly useful for various NLP and Web applications. Currently frequent pattern miningbased methods and syntactic analysis-based methods are two types of leading methods for semantic relation extraction task. With a novel view on integrating syntactic analysis on Wikipedia text with redundancy information from the Web, we propose a mult...

متن کامل

An Integrated Probabilistic and Logic Approach to Encyclopedia Relation Extraction with Multiple Features

We propose a new integrated approach based on Markov logic networks (MLNs), an effective combination of probabilistic graphical models and firstorder logic for statistical relational learning, to extracting relations between entities in encyclopedic articles from Wikipedia. The MLNs model entity relations in a unified undirected graph collectively using multiple features, including contextual, ...

متن کامل

Meronymy Extraction Using An Automated Theorem Prover

In this paper we present a truly semantic-oriented approach for meronymy relation extraction. It directly operates, instead of syntactic trees or surface representations, on semantic networks (SNs). These SNs are derived from texts (in our case, the German Wikipedia) by a deep linguistic syntactico-se mantic analysis. The extraction of meronym/holonym pairs is carried out by using, among other...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009